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practice upon learning how to make realistic predictions. The rate >i 
i ro pro vejuent tended to be higher for high ability students, who gained 
the most from repeated performance* It is suggested that, since the 
study was limited to the familiar task of tost taking, students were 
more likely to assess their performance accurately on this activity 
than on a less familiar one. Because many important decisions must be 
made by the individual, on the basis of ability and interests, after 
he has left the formal educational setting, a strong recomraenda tion 
is made for the teaching of se lf-apprai sa I techniques within the 
tegular school curriculum. The science classes are proposed as a 
logical place to start such instruction, (T A > 
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When people leave the formal educational setting and en- 
ter the worlds of work and leisure, they are required to make 
many decisions based upon their own abilities and interests. 
Each of the decisions requires some assessment about the de- 
gree of success or enjoyment in the activity in which they 
are to become engaged. Hopefully, the evaluation of the po- 
tential activity will be rational and based upon a thorough 
knowledge of personal capabilities. However, experience in- 
dicates that self-evaluation is as difficult to learn as any 
other concept, and perhaps self-appraisal techniques need to 
be developed and taught within the school curriculum. 

Research on self-evaluation is meager, and that which 
has been done generally involves simple tasks not at all 
comparable to the complex activities that individuals later 
undertake. Furthermore, few studies of a longitudinal na- 
ture have been undertaken. 

The technique for studying level of aspiration was de- 
veloped by Lewii and his students (Rotter, 19^i2) and involves 
a variable called a discrepancy score. The discrepancy score 
is defined as a difference between some expected or predicted 
score and some achieved score. Somo researchers use a dis- 
crepancy between achievement on event A and predicted achieve- 
ment of event B. Others use the discrepancy between achieve- 
ment on event A and predicted achievement of event A. This 
technique may also be used to study self-evaluation. 

Some important determinants of level of aspiration are 
brought out by Lewin ( 1 9 3 6 ) * According to Lewin, level of 
aspiration nay be dtermined by the upper limit of the person's 
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achievements (ability) and by the level of achievement of 
his social group (peer group)* A third determinant may be 
the relative success of the individual in accomplishing sim- 
ilar goals in the past* 

Murstein (1965) found that neither high nor low achiev- 
ing college students changed their predictions of final 
grades as a result of midsemester performance. This result 
was contradicted by Wolfe (in press) who found that college 
students becam^nore accurate predictors as a result of mid- 
semester feedback. 

Pennington^ (19^0) experiments on college students 
indicated that failure resulted in a lower level of aspir- 
ation, and success (passing with high grades) resulted in 
an upward swing in predicted scores on the following exami- 
nation, With fifth grade children, Anderson and Brandt 
(1939) found that poor students set goals consistently above 
past performance, and good students set goals consistently 
below past performance. 

In an attempt to determine the influence of sex and 
achievement on the ability to predict test scores for col- 
lege students, Sumner and Johnson (19^9) found discrepancy 
scores to be less for high achieving students than for low 
achieving students. They also found that females of all 
quartile levels are more accurate predictors than males of 
a comparable level. 

With secondary school students Pickup and Anthony 

{1968) found that females who predicted higher scores than 

O 

they received tended to reduce subsequent predictions while 
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males did not. Low achievers were more likely to predict 
higher scores than received than high achievers. 

Classroom measurements from test predictions may suf- 
fer from the experimenter effect. Research completed by 
Rosenfeld and Zander (1961) indicate that the level of as- 
piration may be influenced by reward or power. The rewards 
may be given via non-verbal cues emitted by the teacher in 
advance of and/or during the testing situation. 

Method 

Two hundred ten students in eight general science 
classes and one earth science class from a rural Eastern 
New York secondary school were used as subjects. Classes 
varied in size from sixteen to thirty-two students and were 
taught by two teachers. Within each grade students were 
grouped by academic ability from previous performance. The 
top one-fourth of the students in each grade were grouped 
for enrichment courses and the remaining students were di- 
vided into two sections of comparable ability. 

At the beginning of the school year the teachers ex- 
plained to the students that on each unit test the students 
would be asked to predict the percentage score they would 
get on the test immediately before and immediately after 
taking it. Separate slips of paper were stapled to the test 
for the pretest guess, and when completed were torn off and 
collected. Space was available on the test booklet for 
recording the post-test predictions. Both predicted scores 
and the actual scores were transferred to permanent record 
O :ets. Since pmentage grades were used district-wide as 
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the method of reporting academic progress, the format for 
making predictions was not unfamiliar to the students. The 
random variable employed was a discrepancy score which was 
defined as the absolute difference between a predicted score 
and the obtained score. 

The number of tests given to each class ranged between 
eight and thirteen. All tests were constructed to be dis- 
criminatory in nature, and perfect scores were rarely achie- 
ved. Thus, ceiling effects were not a contaminating vari- 
able. However, report card grades were adjusted to account 
for the test difficulty. 

Students were told to base their predictions upon how 
well they understood the material and how difficult they 
thought the test would be (or was). Reminders were fre- 
quently given that the predictions would not affect actual 
grades in any way. 

In the few cases where the subject failed to make a 
prediction, the mean prediction was used and was derived 
from all the pretest or posttest predicted scores the sub- 
ject did make. 

Within each section subjects were ranked from high to 
low on the final examination. Each section was then divided 
into four parts called quartiles . Within each section, how- 
ever, the quartiles contained unequal n due to tied scores 
and the total section size not being divisible by four. 

Thus, the trend analyses were non-orthogonal ♦ In only one 
section was the ratio of largest to smallest n as large as 
two. 
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One of the uncontrollable variables may have influenced 
the predictions at the beginning of the year. During pre- 
vious years of schooling the students may have been accustom 
ed to grades ranging from a low of sixty to one-hundred per- 
cent (failure set at seventy-five ) . Since the effective pas 
sing grade had suddenly been shifted from seventy-five to 
fifty percent by the teachers in the experiment, the grades 
achieved were lower in most cases* This unfamiliar situa- 
tion may have caused the predicted grades to be much higher 
at the beginning of the year than they were at the end. No 
analysis of this variable was attempted. 

RESULTS 

Within each section a two-way factorial ANOVA was con- 
ducted using the quartiles as one main effect and the time 
of prediction (pretest and posttest) as the other. The da- 
ta were pooled across all tests for each section. Table 1 
presents the findings of the nine ANOVA'S with the signifi- 
cance level set at .05* Table 2 presents the ANOVA for 
section 9* 



Insert Tables 1 and 2 about here 



Significant differences were found among the quartiles with- 
in seven of the nine sections and between the two time5 of 
prediction for three sections. No significant interactions 
were found. 

It might be concluded that even considering several 



trials individual differences will be maintained and that 

6 



6 



some students will be able to predict scores more accurately 
after completing the task after several practice trials than 
other students. Generally speaking, having completed the 
task will not allov for a more accurate self appraisal (be- 
fore feedback) than prior to the task. Furthermore the re- 
lative improvement from pretest to posttest prediction re- 
mains relatively constant for all ability students. 

In order to-^more compleTely ^e xam ine the effect of prac- 
tice upon learning how to make realistic predictions, trend 
analyses were conducted within each section. The assumption 
vas made that each practice trial resulted in an equal amount 
of learning. The trend analyses were conducted upon the 
first and third levels of a three way factorial design: 
quartiles (A) by time of prediction (B) by test number (c). 
The analyses were complicated by the fact that the quartiles 
were of unequal size requiring a non-orthogon al analysis 
technique. In each case the hypotheses were tested in the 
following order: cubic interaction (AXC), cubic trend, (C), 

quadratic interaction (Axe), quadratic trend (0), linear 
interaction (AXC) } linear trend (C), and the contrast of 
the first and last predictions (C). Unfortunately with 
non -or thogonal analyses, the order of hypothesis tested is 
important since the tests of significance arc not indepen- 
dent, In the four cases of multiple significant findings 
the eet of hypothesis tests was not reordered to verify sub- 
sequent results. Note, however, that the tests of the first 
two main effects had been conducted prior to the trend anal- 
Residual terms were not tested for significant higher 
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order effects. Table 3 summarizes the trend analyses for 
the nine sections. At least one significant trend compon- 



Insert Table 3 about here 



ent or contrast was found for eight of the nine sections. 

Since the unit examinations were of differential dif- 
ficulty, it was reasonable to expect that a simple function 
would not be found to describe the trend when the design was 
collapsed on the A effect (all subjects) and on the B effect 
(both pretest and posttest predictions). The expectation 
was borne out when at least a third degree polynomial was 
needed to describe the trend for four sections, and with 
four other sections a polynomial of at least the fourth de- 
gree would be needed. 

Although previous analyses (see Table l) indicated that 
seven of the nine sections had differences among the quar- 
tiles (pooled across tests) only four indicated differences 
on the interaction components of the trend analysis which 
were tested. The apparent contradiction may be explained 
by the higher order components of the trend interaction which 
were not tested. (For example, in section 7 there are four 
levels of A and 8 levels of C making 21 degrees of freedom 
for the interaction term. Only the linear component of the 
A effect was combined with the linear, quadratic, and cubic 
components of the C effect. The higher order effects would 
be at least quadratic in A and quartic in C simultaneously.) 

it might be concluded that the students in the quartile 
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levels learn at varying rates and that these differences can 
he described by a linear function in less than one-half of 
the sections . 

Within the same trend analyses, contrasts of the lest 
predictions with the first predictions were conducted, and 
found to be more accurate at the end of the year in seven of 
nine sect ions . 

Thus , with practice and without instruction as to "how" 
most students were able to improve their ability to evaluate 
their own performance. The distribution of discrepancy 
scores for each time of prediction {pooled across all tests 
and all sections) is given in Table h. 



Insert Table l about here 



Two way analyses of variance (sex by time of prediction) 
were performed after pooling data across tests and quartiles 
within each section. In no instance was a significant dif- 
ference found between males and females. 

Of the 210 students 2b made pretest predictions within 
5 points of their actual score at least one-half of the time. 
On the posttest predictions the number increased to 12. At 
the other end of the spectrum 6** students were off by at least 
15 points one-half (or more) of the time on the pretest pre- 
dictions. This number decreased to 37 on the posttest pre- 
dictions. When the four frequencies are placed in a table 
(see Table 5a) ; the resulting chi-square value for a test of 
O endence (11.6*0 was signnicant at the .05 level. This 

ERIC 
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apparently eontradd ctory result stems from the fact that the 
chi-square analysis was based upon data pooled across all 
sections while the analyses of variance were done on each 
section independently {three of the nine analyses were signi- 
ficant; see Table 1 ) . 

When the pretest and postti’st frequencies (see Table 5b) 
cf those within 5 points were divided into high and low achie- 
vers (within their section) and again analyzed with a test 
for independence, the chi-square value of 5*50 was significant 
at the .05 level, A similar analysis on the other set of fre- 
quencies (see Table 5c) failed to yield a significant chi- 
square value* TnU' , it may be concluded that some students 
will profit from experience while others will not, but the 
more able students have a higher likelihood of improvement. 



Insert Tables 5a, 5b, and 5c about here 



According to Hotter (19^2) and others, predicted scores 
are often dependent upon the actual performance of the pre- 
vious trial, However, in the situations for which they pos- 
tulate this score, the task from trial to trial is identical. 
In the present experiment the predictions are based upon new 
cognitive understandings ior each trial. fince achievement 
scores are somewhat related from test to test, it is not un- 
reasonable that predictions will be related to one another, 
and that discrepancy scores will be mediated by both achieve- 
ment and previous predictions. The assumption was made that 

O 
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discrepancy score for trial t, 

A vector of discrepancy scores was constructed for each 

student and the data coded as conditional frequencies with 

five point intervals. The data for all students in each 

section were pooled and conditional probability matrices 

(transition matrices) were derivei. A Harkov chain analysis 

e 

provided limitinj v/tctors of probabilities (tolerance =-, 0005) 
for each section. (The limiting vector provides an estimate 
of the proportion of time the group will predict any category 
over an infinite number of trials.) The limiting vectors 
were converted to cumulative probability vectors and the pre- 
test vector was compared with the posttest vector with a 
Kolmogor ov-Smirnov Two Sample Test. Table 6 illustrates the 
cumulative proportion vectors for each section and for all 
sections combined. Only sections 7> 8, and 9 and the combined 
group produced vectors which were significantly different at 



tVe .05 level* TaDle 7 illustrates the transition matrices 
tor the pretest and posttest predictions for the combined 
groups. In each case where significance was found the cumu- 



lative probabilities in the lower categories was larger for 
the posttest prediction which indicates student ( least in 
O lie 9) learn at a faster rate following the task than they 



Insert Table 6 about here 



Insert Table 7 about here 
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do prior to the task. Perhaps a certain level of maturity 
is required for self-evaluation accuracy. 

CONCLUSIONS AND IMPLICATIONS 

The ability to accomplish accurate self-evaluation ap- 
pears to be a rarely encountered phenomenon in the junior 
high school, but there is some evidence that students of this 
level can learn how to do it. In this experiment explicit 
instructions about how to make predictions were not given, 
but several students were able to improve their pred .ctions 
over time anyway. Although there were no differences by sex, 
the more able students tended to be more accurate than the 
less able students. Furthermore the rate of impr-' r ement 
tended to be faster for high ability students. Hcwever, be- 
ing a high ability student in no way guarantees hi- being 
able to discover how to accuragely assess his performance, 
and being a low ability student does not insure his being 
unable to discover the process. As night be expected, those 
students who were relatively accurate at the start of the 
experiment tended to gain the most from the repeated practice. 

Evaluation following performance tends to be more accur- 
ate than evaluation prior to performance, but the evidence 
is not clear about this point. Inhelder and Piaget (1958) 
have produced inconclusive evidence that concepts are more 
effectively used by adolescents than by younger people. They 
found that formal reasoning begins to appear about age 11 or 
12, and builds up to a plateau at age it tr 15. Evidence 
from the present study seems to support Inhelder and Piaget, 
b't the age groupings within each grade were not as clear as 
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they should hjive been to fully / ex amine their contention! 

This study was limited to the familiar task of test- 
taking. Since all students have taken many tests it is 
reasonable to conclude that students are more likely to 
assess their performance accurately on these activities 
than on those with which they are unfamiliar. 

Although we know t. transfer of training rarely 
takes place unless it is taught as a separate technique, 
this author believes that self-evaluation techniques are 
not taught in any form in the educational program to^oy, 
With the great emphasis today on objective decision making, 
it would seem important to examine personal capabilities 
and personal performance in an objective light. It would 
also appear that the science classes would be the logical 
place to undertake instruction on self-evaluation since 
objective measurement forms one of the cornerstones of 
this field. 
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TABLES 1 



Results of the Nine Analyses of Variance 
Quartiles by Tine of Prediction 



Section 


Q iar t ile 


Time 


Interaction 


i 


<.05 


ns 


ns 


2 


<.05 


ns 


ns 


3 


ns 


ns 


ns 


h 


<.05 


n s 


ns 


5 


<.05 


<.05 


ns 


6 


ns 


ns 


ns 


7 


<.05 


<.05 


ns 


8 


<.05 


ns 


ns 


9 


<.05 


<.05 


ns 
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TABLE 2 

Analysis of Variance for Section Nine 
Quartiles by Time of Prediction 



Source 


ss 


df 


MS 


F 


P 


Quartiles 


2 U 5 3 . 0 U 


3 


817.68 


9.69 


< .05 


Time 


1389.89 


1 


1389.89 


16.46 


<.05 


Interaction 


226.29 


3 


75.43 


.89 


ns 


Within Cell 


55899.59 


662 


84.1*4 






Total 


59968.81 


669 
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Summary of Trend Analyses ( non-orthogonal ) for Each Section* 
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TABLE h 



Frequency 

For 



Tabulation of Discrepancy 
Each Time of Prediction 



Scores 



Discrepancy Interval 





0-5 


6-10 


11-15 


16-20 


21-25 


> 25 


Pretest 


556 


!{3*i 


270 


25 1 * 


167 


300 


Posttest 


6k3 


!*53 


310 


227 


148 


220 
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TABLE 5a 

Frequency Table for Good and Poor Predictors 





Good ( £ 5) 


Poor (> l6) 


— 


Pretest 


24 


64 


X 2 = 11.64 


Posttest 


42 


37 


P £ .05 



Frequency 


TABLE 5b 

Table by Achievement Level for 
Predictors 


Good 




Pretest 


Posttest 




Top half 


10 


31 


X' = 5.50 


Bottom half 


l u 


11 


P £ .05 




TABLE 5c 






Frequency Table 


by Achievement Level for Poor 


Predictors 




Pretest 


Posttest 




Top half 


19 


9 


X 2 = .34 


Bottom half 


45 


28 


p is ns 
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TABLE 6 

Cumulative Proportion Vectors For Kolmogorov- 
Smirnov Two Sample Tests By Section 



Section 


0- 5 


6-3o 


11-15 


16-20 


21-25 


over 25 


1 Pretest 


.27 


.43 


.55 


.68 


.79 


1.00 


Posttest 


.25 


.46 


.60 


.72 


.80 


1.00 


2 Pretest 


.36 


.64 


• 72 


.85 


.96 


1.00 


Posttest 


. 35 


.61 


.77 


.85 


.93 


1.00 


3 Pretest 


.31 


.59 


.84 


• 92 


• 96 


1.00 


Posttest 


. 42 


.69 


.83 


• 92 


.96 


1.00 


4 Pretest 


• 23 


.48 


.61 


.74 


.89 


1.00 


Posttest 


.27 


• 50 


.65 


.75 


.85 


1.00 


5 Pretest 


.26 


.47 


.63 


.73 


.83 


1.00 


Post test 


.32 


• 55 


• 72 


.80 


.90 


1.00 


6 Pretest 


.26 


• 45 


.62 


.74 


.82 


1.00 


Posttest 


.26 


.46 


.60 


• 74 


.82 


1.00 


7 Pretest 


.22 


. 40 


• 55 


.68 


.79 


1.00 


Posttest 


.34 


• 56 


.72 


.85 


• 93 


1.00 


8 Pretest 


.26 


. 42 


.54 


• 74 


.83 


1.00 


Postteo t 


.29 


• 52 


.66 


.80 


.86 


1.00 


9 Pretest 


.34 


.6l 


• 75 


.85 


.90 


1.00 


Posttest 


.44 


.67 


.83 


• 90 


.95 


1.00 


Combined 


Pretest 


. 28 


.49 


.64 


• 76 


.85 


1.00 


Posttest 


.33 


.55 


• 71 


.82 


.89 


1.00 



O 

ERIC 



19 



TABLE 7 



TRANSITION MATRICES AND LIMITING VECTORS FOR ALL 
SUBJECTS COMBINED ON PRETEST AND POSTTEST PREDICTIONS* 



transition matrix for pretest predictions 





0- 5 


6-10 


11-15 


16-20 


20-23 


over 25 


0- 5 


.307 


.21*7 


.166 


• 135 


.058 


.087 


6-10 


.27k 


.21*2 


.158 


.111 


.071 


.11*1* 


11-15 


.322 


.222 


.117 


.097 


.117 


.125 


16-20 


.21*3 


.270 


.131* 


.122 


.07 1 * 


.157 


21-25 


. 226 


.129 


.129 


.181 


.090 


.21*5 


> 25 


.21*8 


.118 


.118 


.150 


.118 


.21*8 





transition m 


itrix for 


posttest pr 


edi ctions 






0- 5 


6-10 


11-15 


16-20 


21-23 


over 25 


0- 5 


376 


.252 


.11*0 


.098 


.01*7 


.087 


6-10 


.336 


. 2lU 


.181 


.101 


.085 


.083 


n-15 


• 33>* 


.261* 


.156 


.090 


.073 


.083 


16-20 


.288 


.255 


• l60 


.099 


.080 


.118 


21-25 


.296 


.193 


.126 


.178 


.081 


.126 


> 25 


.213 


.127 


.11*5 


.15^ 


.113 


.21*8 




limiting vectors for 


both predict; 


ions 






0- 5 


6-10 


11-15 


16-20 


21-25 


over 25 


Pre 


. 278 


.216 


.11*2 


.128 


.083 


.151 


Post 


.327 


.227 


.153 


.109 


.072 


.109 


* N 1 b 


1788 and 


179 1 * for 


pretest 


and posttest 


predictions 





respe cti vely . 
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